Crowdsourcing platforms offer a practical solution to the problem ofaffordably annotating large datasets for training supervised classifiers.Unfortunately, poor worker performance frequently threatens to compromiseannotation reliability, and requesting multiple labels for every instance canlead to large cost increases without guaranteeing good results. Minimizing therequired training samples using an active learning selection procedure reducesthe labeling requirement but can jeopardize classifier training by focusing onerroneous annotations. This paper presents an active learning approach in whichworker performance, task difficulty, and annotation reliability are jointlyestimated and used to compute the risk function guiding the sample selectionprocedure. We demonstrate that the proposed approach, which employs activelearning with Bayesian networks, significantly improves training accuracy andcorrectly ranks the expertise of unknown labelers in the presence of annotationnoise.
展开▼